multiple operating point
Hierarchical classification at multiple operating points
Figure 4: Impact of loss hyper-parameters on trade-off with iNat21-Mini (correct vs. recall). Table 3 outlines the parametrisation that corresponds to each loss function. Table 3: Definition and properties of the parametrisations used by each loss function.Loss θ Parametrisation Properties Flat softmax, HXE [2] R Algorithm 1 Algorithm for finding ordered Pareto set. We use square brackets to denote array elements (subscripts were used in the main text).procedure
Hierarchical classification at multiple operating points
Many classification problems consider classes that form a hierarchy. Classifiers that are aware of this hierarchy may be able to make confident predictions at a coarse level despite being uncertain at the fine-grained level. While it is generally possible to vary the granularity of predictions using a threshold at inference time, most contemporary work considers only leaf-node prediction, and almost no prior work has compared methods at multiple operating points. We present an efficient algorithm to produce operating characteristic curves for any method that assigns a score to every class in the hierarchy. Applying this technique to evaluate existing methods reveals that top-down classifiers are dominated by a naive flat softmax classifier across the entire operating range.